
RAW SEQUENCE LISTING DATE: 04/23/2002 

PATENT APPLICATION: US/10/040 , 895 TIME: 15:35:39 

Input Set : A:\Tb5072.txt 

Output Set: N:\CRF3\04232002\J040895.raw 
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8 <12 0> TITLE OF INVENTION: Methods for Predicting Functional and 

9 Structural Properties of Polypeptides Using Sequence Models 



12 <130> FILE REFERENCE: P-TB 5072 

14 <140> CURRENT APPLICATION NUMBER: US 10/040,895 

C--> 15 <141> CURRENT FILING DATE: 2002-04-09 

17 <150> PRIOR APPLICATION NUMBER: US 09/753,020 

18 <151> PRIOR FILING DATE: 2000-12-29 
20 <160> NUMBER OF SEQ ID NOS : 17 

22 <170> SOFTWARE: FastSEQ for Windows Version 4.0 
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29 <400> SEQUENCE: 1 



30 


Cys 


Leu 


He 


Gly 


Cys 


Gly 


Phe 


Ser 


Thr Gly 


Tyr 


Gly Ala 


Ala 


Val 


Lys 


31 


.1 








5 










10 










15 




32 


Thr 


Gly 


Lys 


Val 


Lys 


Pro 


Gly 


Ser 


Thr 


Cys 


Val 


Val 


Phe 


Gly 


Leu 


Gly 


33 








20 










25 










30 






34 


Gly 


Val 


Gly 


Leu 


Ser 


Val 


He 


Met 


Gly Cys 


Lys 


Ser 


Ala 


Gly 


Ala 


Ser 


35 






35 










40 










45 








36 


Arg 


He 


He 


Gly 


He 


Asp 


Leu 


Asn 


Lys 


Asp 


Lys 


Phe 


Glu 


Lys 


Ala 


Met 


37 




50 










55 










60 










38 


Ala 


Val 


Gly 


Ala 


Thr 


Glu 


Cys 


He 


Ser 


Pro 


Lys 


Asp 


Ser 


Thr 


Lys 


Pro 


39 


65 










70 










75 










80 


40 


He 


Ser 


Glu 


Val 


Leu 


Ser 


Glu 


Met 


Thr 


Gly 


Asn 


Asn 


Val 


Gly 


Tyr 


Thr 


41 










85 










90 










95 




42 


Phe 


Glu 


Val 


He 


Gly 


His 


Leu 


Glu 


Thr 


Met 


He 


Asp 


Ala 


Leu 


Ala 


Ser 


43 








100 










105 










110 






44 


Cys 


His 


Met 


Asn 


Tyr 


Gly 


Thr 


Ser 


Val 


Val 


Val 


Gly 


Val 


Pro 


Pro 


Ser 


45 






115 










120 










125 








46 


Ala 


Lys 


Met 


Leu 


Thr 


Tyr 


Asp 


Pro 


Met 


Leu 


Leu 


Phe 


Thr 


Gly 


Arg 


Thr 


47 




130 










135 










140 










48 


Trp 


Lys 


Gly 


Cys 


Val 


Phe 


Gly 


Gly 


Leu 


Lys 


Ser 












49 


145 










150 










155 













52 <210> SEQ ID NO: 2 

53 <211> LENGTH: 152 

54 <212> TYPE: PRT 

55 <213> ORGANISM: Equus caballus 

57 <400> SEQUENCE: 2 

58 Gly Cys Gly Phe Ser Thr Gly Tyr Gly Ser Ala Val Lys Val Ala Lys 



file://C:\CRF3\Outhold\VsrJ040895.htm 



4/23/02 



Page 2 of 7 

) 

RAW SEQUENCE LISTING DATE: 04/23/2002 

PATENT APPLICATION: US/10/040 , 895 TIME: 15:35:39 

Input Set : A:\Tb5072.txt 

Output Set: N:\CRF3\04232002\J040895.raw 



59 


1 






5 










10 




15 


60 


Val 


Thr Gin 


Gly 


Ser 


Thr 


Cys 


Ala 


Val 


Phe 


Gly Leu Gly Gly Val Gly 


61 






20 










25 






30 


62 


Leu 


Ser Val 


He 


Met 


Gly 


Cys 


Lys 


Ala 


Ala 


Gly Ala Ala Arg He He 


63 




35 










40 








45 


64 


Gly 


Val Asp 


He 


Asn 


Lys 


Asp 


Lys 


Phe 


Ala 


Lys 


Ala Lys Glu Val Gly 


65 




50 








55 










60 


66 


Ala 


Thr Glu 


Cys 


Val 


Asn 


Pro 


Gin 


Asp 


Tyr 


Lys 


Lys Pro He Gin Glu 


67 


65 








70 










75 


80 


68 


Val 


Leu Thr 


Glu 


Met 


Ser 


Asn 


Gly 


Gly 


Val 


Asp 


Phe Ser Phe Glu Val 


69 








85 










90 




95 


70 


He 


Gly Arg 


Leu 


Asp 


Thr 


Met 


Val 


Thr 


Ala 


Leu 


Ser Cys Cys Gin Glu 


71 






100 










105 






110 


72 


Ala 


Tyr Gly Val 


Ser 


Val 


He 


Val 


Gly 


Val 


Pro 


Pro Asp Ser Gin Asn 


73 




115 










120 








125 


74 


Leu 


Ser Met 


Asn 


Pro 


Met 


Leu 


Leu 


Leu 


Ser 


Gly Arg Thr Trp Lys Gly 


75 




130 








135 










140 


76 


Ala 


He Phe Gly Gly 


Phe 


Lys 


Ser 










77 


145 








150 














80 


<210> SEQ ID NO: 


3 
















81 


<211> LENGTH: 175 
















82 


<212> TYPE: 


PRT 


















83 


<213> ORGANISM: 


Thermoanaerobium Brockii 




85 


<400> SEQUENCE: ' 


3 
















86 


Val 


Met He 


Pro 


Asp 


Met 


Met 


Thr 


Thr 


Gly 


Phe 


His Gly Ala Glu Leu 


87 


1 






5 










10 




15 


88 


Ala 


Asp He 


Glu 


Leu 


Gly 


Ala 


Thr 


Val 


Ala 


Val 


Leu Gly He Gly Pro 


89 






20 










25 






30 


90 


Val 


Gly Leu 


Met 


Ala 


Val 


Ala 


Gly 


Ala 


Lys 


Leu 


Arg Gly Ala Gly Arg 


91 




35 










40 








4 5 


92 


He 


He Ala 


Val 


Gly 


Ser 


Arg 


Pro 


Val 


Cys 


Val 


Asp Ala Ala Lys Tyr 


93 




50 








55 










60 


94 


Tyr 


Gly Ala 


Thr 


Asp 


He 


Val 


Asn 


Tyr 


Lys 


Asp 


Gly Pro He Glu Ser 


95 


65 








70 










75 


80 


96 


Gin 


He Met 


Asn 


Leu 


Thr 


Glu 


Gly 


Lys 


Gly 


Val 


Asp Ala Ala He He 


97 








85 










90 




95 


98 


Ala 


Gly Gly 


Asn 


Ala 


Asp 


He 


Met 


Ala 


Thr 


Ala 


Val Lys He Val Lys 


99 






100 










105 






110 


100 


Pro 


Gly Gly 


Thr 


He 


Ala 


Asn 


Val 


Asn Tyr Phe Gly Glu Gly Glu Val 


101 




115 










120 








125 


102 


Leu 


Pro Val 


Pro 


Arg 


Leu 


Glu 


Trp 


Gly 


Cys 


Gly Met Ala His Lys Thr 


103 




130 








135 










140 


104 


He 


Lys Gly Gly Leu 


Cys 


Pro 


Gly Gly Arg 


Leu 


Arg Met Glu Arg Leu 


105 


145 








150 










155 


160 


106 


He 


Asp Leu 


Val 


Phe 


Tyr 


Lys 


Arg 


Val 


Asp 


Pro 


Ser Lys Leu Val 


107 








165 










170 




175 


110 


<210> SEQ ID NO 


: 4 
















111 


<211> LENGTH: 141 
















112 


<212> TYPE: 


PRT 



















file://C:\CRF3\Outhold\VsrJ040895.htm 



4/23/02 



RAW SEQUENCE LISTING DATE: 04/23/2002 

PATENT APPLICATION: US/10/040 , 895 TIME: 15:35:39 



input Set : A:\Tb5072.txt 

Output Set: N:\CRF3\04232002\J040895.raw 



113 


<213> ORGANISM: 


Lactobacillus conf usus 














115 


<400> SEQUENCE: 


4 
























116 


Ala 


Arg 


Lys 


He Gly 


He 


lie 


Gly 


Leu 


Gly 


Asn 


"IT -1 

vai 


Giy 


A 1 *S 

Aia 


Ala 

Aia 


17-, 1 

vai 


117 


1 








D 










1U 










1 

Id 




118 


Aia 


HIS 


Giy 


Leu 


lie 


Ala 

Ala 


Gin 


Giy 


vai 


Ala 

Aia 


Asp 


Asp 


Tyr 


AT— T 

vai 


trie 


lie 


119 








20 


























120 


Asp 


Ala 


Asn 


Glu 


Ala 


Lys 


vai 


Lys 


Ala 

Ala 


Asp 


Gin 


He 


Asp 


trie 


uin 


Asp 


121 






35 










A f\ 

4 U 










A ^ 
4 D 








122 


Ala 


Met 


Ala 


Asn 


Leu 


G1U 


Ala 

Aia 


HIS 


Giy 


Asn 


He 


Val 


lie 


Asn 


Asp 


Trp 


123 




50 










t; ^ 
jj 










60 










124 


Ala 


Ala 


Leu 


Ala 


Asp 


Ala 


Asp 


Val 


Val 


He 


Ser 


Thr 


Leu 


Giy 


Asn 


lie 


125 


65 










70 










75 










an 

o u 


126 


Lys 


Leu 


Gin 


Gin 


Phe 


Ala 


Glu 


Leu 


Lys 


Phe 


Thr 


Ser 


Ser 


jyieu 


vai 


Gin 


127 










85 










90 










q 




128 


Ser 


Val 


Gly 


Thr 


Asn 


Leu 


Lys 


Glu 


Ser 


Gly 


Phe 


His 


Gly 


vai 


Leu 


Val 


129 








100 










105 










1 1 A 

11U 






130 


Val 


He 


Ser 


Asn 


Pro 


Val 


Asp 


Val 


lie 


Thr 


Ala 


Leu 


Phe 


Gin 


His 


vai 


131 






115 










120 


















132 


Thr 


Gly 


Phe 


Pro 


Ala 


His 


Lys 


Val 


He 


Gly 


Thr Gly 


Thr 








133 




130 










135 










140 










136 


<210> SEQ ID NO 


: 5 
























137 


<211> LENGTH: 147 
























138 


<212> TYPE: 


PRT 


























139 


<213> ORGANISM: 


B . Stearothermophilus 














141 


<400> SEQUENCE: 


5 
























142 


Met 


Lys 


Asn 


Asn 


Gly 


Gly 


Ala 


Arg 


Val 


Val 


Val 


He 


Gly 


Ala 

Aia 


Giy 


pne 


143 


1 








5 










10 










Id 




144 


Val 


Gly 


Ala 


Ser 


Tyr 


Val 


Phe 


Ala 


Leu 


Met 


Asn 


Gin 


Gly 


lie 


Ala 

Aia 


Asp 


145 








20 










25 










"2 rv 
OU 






146 


Glu 


He 


Val 


Leu 


He 


Asp 


Ala 


Asn 


Glu 


Ser 


Lys 


Ala 


He 


Giy 


Asp 


Ala 

Aia 


147 






35 










40 










A R 
4 D 








148 


Met 


Asp 


Phe 


Asn 


His 


Gly 


Lys 


Val 


Phe 


Ala 


Pro 


Lys 


Pro 


vai 


Asp 


lie 


149 




50 










55 










60 










150 


Trp 


His 


Gly 


Asp 


Tyr 


Asp 


Asp 


Cys 


Arg 


Asp 


Ala 


Asp 


Leu 


-IT- T 

vai 


vai 


Tic 

lie 


151 


65 










70 










75 










o u 


152 


Cys 


Ala 


Gly 


Ala 


Asn 


Gin 


Lys 


Pro 


Gly 


Glu 


Thr 


Arg 


Leu 


Asp 


Leu 


17a 1 

vai 


153 










85 










90 










9d 




154 


Asp 


Lys 


Asn 


He 


Ala 


He 


Phe 


Arg 


Ser 


He 


Val 


Glu 


Ser 


Val 


Met 


Ala 


155 








100 










105 










110 






156 


Ser 


Gly 


Phe 


Gin 


Gly 


Leu 


Phe 


Leu 


Val 


Ala 


Thr 


Asn 


Pro 


Val 


Asp 


lie 


157 






115 










120 










125 








158 


Leu 


Thr 


Tyr 


Ala 


Thr 


Trp 


Lys 


Phe 


Ser 


Gly 


Leu 


Pro 


His 


Glu 


Arg 


Val 


159 




130 










135 










140 










160 


He 


Gly 


Ser 




























161 


145 
































164 


<210> SEQ ID NO: 


: 6 
























165 


<211> LENGTH: 312 
























166 


<212> TYPE: 


PRT 


























167 


<213> ORGANISM: 


E. Coli 























file://C:\CRF3\Outhold\VsrJ040895.htm 



RAW SEQUENCE LISTING DATE: 04/23/2002 

PATENT APPLICATION: US/10/040 f 895 TIME: 15:35:39 



Input Set : A:\Tb5072.txt 

Output Set: N:\CRF3\04232002\J040895.raw 



169 


<4 00> SEQUENCE: 


6 
























170 


Met 


Lys 


Val 


Ala 


Val 


Leu 


Gly 


Ala 


Ala 


Gly 


Gly 


He 


Gly 


Gin 


Ala 


Leu 


171 


1 








5 










10 










15 




172 


Ala 


Leu 


Leu 


Leu 


Lys 


Thr 


Gin 


Leu 


Pro 


Ser 


Gly 


Ser 


Glu 


Leu 


Ser 


Leu 


173 








20 










25 










30 






174 


Tyr 


Asp 


He 


Ala 


Pro 


Val 


Thr 


Pro 


Gly 


Val 


Ala 


Val 


Asp 


Leu 


Ser 


His 


175 






35 










40 










45 








176 


He 


Pro 


Thr 


Ala 


Val 


Lys 


He 


Lys 


Gly 


Phe 


Ser 


Gly 


Glu 


Asp 


Ala 


Thr 


177 




50 










55 










60 










178 


Pro 


Ala 


Leu 


Glu 


Gly 


Ala 


Asp 


Val 


Val 


Leu 


He 


Ser 


Ala 


Gly 


Val 


Arg 


179 


65 










70 










75 










80 


180 


Arg 


Lys 


Pro 


Gly 


Met 


Asp 


Arg 


Ser 


Asp 


Leu 


Phe 


Asn 


Val 


Asn 


Ala 


Gly 


181 










85 










90 










95 




182 


He 


Val 


Lys 


Asn 


Leu 


Val 


Gin 


Gin 


Val 


Ala 


Lys 


Thr 


Cys 


Pro 


Lys 


Ala 


183 








100 










105 










110 






184 


Cys 


He 


Gly 


He 


He 


Thr 


Asn 


Pro 


Val 


Asn 


Thr 


Thr 


Val 


Ala 


He 


Ala 


185 






115 










120 










125 








186 


Ala 


Glu 


Val 


Leu 


Lys 


Lys 


Ala 


Gly 


Val 


Tyr 


Asp 


Lys 


Asn 


Lys 


Leu 


Phe 


187 




130 










135 










140 










188 


Gly 


Val 


Thr 


Thr 


Leu 


Asp 


He 


He 


Arg 


Ser 


Asn 


Thr 


Phe 


Val 


Ala 


Glu 


189 


145 










150 










155 










160 


190 


Leu 


Lys 


Gly 


Lys 


Gin 


Pro 


Gly 


Glu 


Val 


Glu 


Val 


Pro 


Val 


He 


Gly 


Gly 


191 










165 










170 










175 




192 


His 


Ser 


Gly 


Val 


Thr 


He 


Leu 


Pro 


Leu 


Leu 


Ser 


Gin 


Val 


Pro 


Gly 


Val 


193 








180 










185 










190 






194 


Ser 


Phe 


Thr 


Glu 


Gin 


Glu 


Val 


Ala 


Asp 


Leu 


Thr 


Lys 


Arg 


He 


Gin 


Asn 


195 






195 










200 










205 








196 


Ala 


Gly 


Thr 


Glu 


Val 


Val 


Glu 


Ala 


Lys 


Ala 


Gly 


Gly 


Gly 


Ser 


Ala 


Thr 


197 




210 










215 










220 










198 


Leu 


Ser 


Met 


Gly 


Gin 


Ala 


Ala 


Ala 


Arg 


Phe 


Gly 


Leu 


Ser 


Leu 


Val 


Arg 


199 


225 










230 










235 










240 


200 


Ala 


Leu 


Gin 


Gly 


Glu 


Gin 


Gly 


Val 


Val 


Glu 


Cys 


Ala 


Tyr 


Val 


Glu 


Gly 


201 










245 










250 










255 




202 


Asp 


Gly 


Gin 


Tyr 


Ala 


Arg 


Phe 


Phe 


Ser 


Gin 


Pro 


Leu 


Leu 


Leu 


Gly 


Lys 


203 








260 










265 










270 






204 


Asn 


Gly 


Val 


Glu 


Glu 


Arg 


Lys 


Ser 


He 


Gly 


Thr 


Leu 


Ser 


Ala 


Phe 


Glu 


205 






275 










280 










285 








206 


Gin 


Asn 


Ala 


Leu 


Glu 


Gly 


Met 


Leu 


Asp 


Thr 


Leu 


Lys 


Lys 


Asp 


He 


Ala 


207 




290 










295 










300 










208 


Leu 


Gly 


Gin 


Glu 


Phe 


Val 


Asn 


Lys 


















209 


305 










310 






















212 


<210> SEQ ID NO: 


; 7 
























213 


<211> LENGTH: 163 
























214 


<212> TYPE: 


PRT 


























215 


<213> ORGANISM: 


Sus 


scrof a 




















217 


<400> SEQUENCE: 


7 
























218 


Ala 


Thr 


Leu 


Lys 


Asp 


Gin 


Leu 


He 


His 


Asn 


Leu 


Leu 


Lys 


Glu 


Glu 


His 


219 


1 








5 










10 










15 




220 


Val 


Pro 


His 


Asn 


Lys 


He 


Thr 


Val 


Val 


Gly 


Val 


Gly 


Ala 


Val 


Gly 


Met 



file://C:\CRF3\OutholdWsrJ040895.htm 



RAW SEQUENCE LISTING DATE: 04/23/2002 

PATENT APPLICATION: US/10/040 , 895 TIME: 15:35:39 



Input Set : A:\Tb5072.txt 

Output Set: N:\CRF3\04232002\J040895.raw 



221 








20 










2 5 










222 


Ala 


Cys 


Ala 


He 


Ser 


He 


Leu 


Met 


Lys 


Glu Leu 


Ala 


Asp 


G1U lie ftla 


223 






35 










A A 

40 








A C 




224 


Leu 


Val 


Asp 


Val 


Met 


Glu 


Asp 


Lys 


Leu 


Lys Gly Glu 




wet Asp lieu 


225 




50 










55 








bU 






226 


Gin 


HIS 


Gly 


Ser 


Leu 


pne 


Leu 


Arg 


Thr 


Pro Lys 


lie 


Vdl 


Cot* Pi v T.VQ 
o fcr J- u-L^y JU_y o 


227 


65 










70 








75 






an 


228 


Asp 


Tyr 


Asn 


Val 


Thr 


Ala 


Asn 


Ser 


Arg 


Leu Val 


Val 


lie 


rpU-r Ala f ~\ t t 

inr Aia biy 


229 










85 










90 








230 


Ala 


Arg 


Gin 


Gin 


Glu 


Gly 


Glu 


Ser 


Arg 


Leu Asn 


Leu 


val 


bin Arg Asn 


231 








100 










T A C 

105 








11U 


232 


Val 


Asn 


He 


Phe 


Lys 


Phe 


He 


He 


Pro 


Asn He 


Val 


Lys 


Tyr Ser Pro 


233 






115 










120 








125 




234 


Asn 


Cys 


Lys 


Leu 


Leu 


Val 


Val 


Ser 


Asn 


Pro Val 


Asp 


He 


Leu Thr Tyr 


235 




130 










135 








140 






236 


val 


Ala 


Trp 


Lys 


He 


Ser 


Gly Phe 


Pro 


Lys Asn 


Arg 


Val 


lie Giy ber 


237 


145 










150 








155 






lbU 


238 


Gly 


Cys 


Asn 






















242 


<210> SEQ ID NO: 


: 8 


















243 


<211> LENGTH: 333 


















244 


<212> TYPE: 


PRT 




















245 


<213> ORGANISM: 


Sus 


scrof a 














247 


<400> SEQUENCE: 


8 


















248 


Ser 


Glu 


Pro 


He 


Arg 


Val 


Leu 


Val 


Thr Gly Ala Ala Gly 


Gin ne Aia 


249 


1 








5 










10 






Id 


250 


Tyr 


Ser 


Leu 


Leu 


Tyr 


Ser 


He Gly 


Asn Gly Ser Val 


Phe 


Giy iiys Asp 


251 








20 










25 








O A 


252 


Gin 


Pro 


He 


He 


Leu 


Val 


Leu 


Leu 


Asp 


He Thr 


Pro 


Met 


Met. Giy vai 


253 






35 










40 








45 




254 


Leu 


Asp 


Gly 


Val 


Leu 


Met 


Glu 


Leu 


Gin 


Asp Cys 


Ala 


Leu 


Pro Leu Leu 


255 




50 










55 








60 






256 


Lys 


Asp 


Val 


He 


Ala 


Thr 


Asp 


Lys 


Glu 


Glu He 


Ala 


Phe 


Lys Asp Leu 


257 


65 










70 








75 








258 


Asp 


Val 


Ala 


He 


Leu 


Val 


Gly Ser 


Met 


Pro Arg 


Arg 


Asp 


/~* 1 t r Mat- Pin 

Gly Met giu 


259 










85 










90 






A C 

95 


260 


Arg 


Lys 


Asp 


Leu 


Leu 


Lys 


Ala 


Asn 


Val 


Lys He 


Phe 


Lys 


Cys Gin Gly 


261 








100 










105 








1 "I A 

110 


262 


Ala 


Ala 


Leu 


Asp 


Lys 


Tyr 


Ala 


Lys 


Lys 


Ser Val 


Lys 


Val 


He Val Val 


263 






115 










120 








125 




264 


Gly 


Asn 


Pro 


Ala 


Asn 


Thr 


Asn 


Cys 


Leu 


Thr Ala 


Ser 


Lys 


Ser Ala Pro 


265 




130 










135 








140 






266 


Ser 


He 


Pro 


Lys 


Glu 


Asn 


Phe 


Ser 


Cys 


Leu Thr 


Arg 


Leu 


Asp His Asn 


267 


145 










150 








155 






160 


268 


Arg 


Ala 


Lys 


Ala 


Gin 


He 


Ala 


Leu 


Lys 


Leu Gly Val 


Thr 


Ser Asp Asp 


269 










165 










170 






175 


270 


Val 


Lys 


Asn 


Val 


He 


He 


Trp 


Gly 


Asn 


His Ser 


Ser 


Thr 


Gin Tyr Pro 


271 








180 










185 








190 


272 


Asp 


Val 


Asn 


His 


Ala 


Lys 


Val 


Lys 


Leu 


Gin Ala 


Lys 


Glu 


Val Gly Val 


273 






195 










200 








205 





file://C:\CRF3\Outhold\VsrJ040895.htm 



Page 6 of 7 



VERIFICATION SUMMARY DATE: 04/23/2002 

PATENT APPLICATION: US/10/040 , 895 TIME: 15:35:40 

Input Set : A:\Tb5072.txt 

Output Set: N:\CRF3\04232002\J040895.raw 

L:15 M:271 C: Current Filing Date differs, Replaced Current Filing Date 



file://C:\CRF3\Outhold\VsrJ040895.htm 



4/23/02 



